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Abstract 

There has been a recent interest in understanding the power of local algorithms for opti¬ 
mization and inference problems on sparse graphs. Gamarnik and Sudan (2014) showed that 
local algorithms are weaker than global algorithms for finding large independent sets in sparse 
random regular graphs thus refuting a conjecture by Hatami, Lovasz, and Szegedy (2012). Mon- 
tanari (2015) showed that local algorithms are suboptimal for finding a community with high 
connectivity in the sparse Erdos-Renyi random graphs. For the symmetric planted partition 
problem (also named community detection for the block models) on sparse graphs, a simple 
observation is that local algorithms cannot have non-trivial performance. 

In this work we consider the effect of side information on local algorithms for community 
detection under the binary symmetric stochastic block model. In the block model with side 
information each of the n vertices is labeled -I- or — independently and uniformly at random; 
each pair of vertices is connected independently with probability a/n if both of them have the 
same label or b/n otherwise. The goal is to estimate the underlying vertex labeling given 1) the 
graph structure and 2) side information in the form of a vertex labeling positively correlated 
with the true one. Assuming that the ratio between in and out degree a/h is 0(1) and the 
average degree {a-\-b)/2 = n°^^\ we show that a local algorithm, namely, belief propagation run 
on the local neighborhoods, maximizes the expected fraction of vertices labeled correctly in the 
following three regimes: 

• |a — &| < 2 and all 0 < a < 1/2 

• (a — 6)^ > C{a -I- b) for some constant C and all 0 < a < 1/2 

• For all a, b if the probability that each given vertex label is incorrect is at most a* for 
some constant a* G (0,1/2). 

Thus, in contrast to the case of independent sets or a single community in random graphs 
and to the case of symmetric block models without side information, we show that local algo¬ 
rithms achieve optimal performance in the above three regimes for the block model with side 
information. 

To complement our results, in the large degree limit a —>■ oo, we give a formula of the 
expected fraction of vertices labeled correctly by the local belief propagation, in terms of a fixed 
point of a recursion derived from the density evolution analysis with Gaussian approximations. 


‘Research supported by NSF grants CCF 1320105, DOD ONR grant N00014-14-1-0823, and grant 328025 from 
the Simons Foundation. E.M is with Department of Statistics, The Wharton School, University of Pennsylva¬ 
nia, Philadelphia, PA and with the Departments of Statistics and Computer Science, U.C. Berkeley, Berkeley CA, 
mosselSwharton.upenn.edu. 

^Research supported by DOD ONR Grant N00014-14-1-0823, and Grant 328025 from the Simons Founda¬ 
tion. J. X is with Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, 
jiamingxOwharton.upenn.edu. 


1 



1 Introduction 


In this work we study the performance of local algorithms for community detection in sparse graphs 
thus combining two lines of work which saw recent breakthroughs. 

The optimality of the performance of local algorithm for optimization problems on large graphs 
was raised by Hatami, Lovasz, and Szegedy [24] in the context of a theory of graph limits for sparse 
graphs. The conjecture, regarding the optimality for finding independent sets in random graphs was 
refuted by Gamarnik and Sudan [20]. More recently, Montanari [35] showed that local algorithms 
are strictly suboptimal comparing to the global exhaustive search for finding a community with 
high connectivity in the sparse Erdos-Renyi random graph. 

In a different direction, following a beautiful conjecture from physics [17], new efficient algo¬ 
rithms for the stochastic block models (i.e. planted partition) were developed and shown to detect 
the blocks whenever this is information theoretically possible [39, 37, 31, 9]. It is easy to (see 
e.g. [27]) that no local algorithm with access to neighborhoods of radius o(logn) can have non¬ 
trivial performance for this problem. 

Our interest in this paper is in the application of local algorithms for community detection with 
side information on community structures. The motivations are two-folded: 1) from a theoretical 
perspective it is interesting to ask what is the effect of side information on the existence of optimal 
local algorithms 2) from the application perspective, it is important to know how to efficiently 
exploit side information in addition to the graph structure for community detection. We show that 
unlike the cases of independent sets on regular graphs or the case of community detection on sparse 
random graphs, local algorithms do have optimal performance. 

1.1 Local algorithms 

Local algorithms for optimization problems on sparse graphs are algorithms that determine if each 
vertex belongs to the solution or not based only on a small radius neighborhood around the node. 
Such algorithms are allowed to have an access to independent random variables associated to each 
node. 

A simple example for a local algorithm is the following classical algorithm for hnding inde¬ 
pendent sets in graphs. Attach to each node v an independent uniform random variable U^. Let 
the independent set consist of all the vertices whose Uv value is greater than that of all of their 
neighbors. See Definition 2.2 for a formal definition of a local algorithm and [30, 24, 20, 35] for 
more background on local algorithms. 

There are many motivations for studying local algorithms: These algorithms are efficient: for 
example, for bounded degree graphs they run in linear time in the size of the graph and for graphs 
with maximal degree polylog(n) they run in time n x polylog(n). Moreover, by design, these 
algorithms are easy to run in a distributed fashion. Moreover, the existence of local algorithms 
implies correlation decay properties that are of interest in statistical physics, graph limit thoery 
and ergodic theory. Indeed the existence of a local algorithm implies that the solution in one part 
of the graph is independent of the solution in a far away part, see [30, 24, 20] for a more formal 
and comprehensive discussion. 

A striking conjecture of Hatami, Lovasz, and Szegedy [24] stated that local algorithms are able 
to find independent sets of the maximal possible density in random regular graphs. This conjecture 
was refuted by Gamarnik and Sudan [20]. The work of Gamarnik and Sudan [20] highlights the 
role of long range correlation and clustering in the solution space as obstacles for the optimality of 
local algorithms. Refining the methods of Gamarnik and Sudan, Rahman and Virag [42] showed 
that local algorithms cannot find independent sets of size larger than half of the optimal density. 
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1.2 Community detection in sparse graphs 

The stochastic block model is one of the most popular models for networks with clusters. The 
model has been extensively studied in statistics [25, 44, 8, 10, 47, 21], computer science (where it 
is called the planted partition problem) [19, 26, 16, 32, 15, 14, 12, 4, 13] and theoretical statistical 
physics [17, 48, 18]. In the simplest binary symmetric form, it assumes that n vertices are assigned 
into two clusters, or equivalently labeled with + or —, independently and uniformly at random; 
each pair of vertices is connected independently with probability a/n if both of them are in the 
same cluster or b/n otherwise. 

In the dense regime with a = n(logn), it is possible to exactly recover the clusters from the 
observation of the graph. A sharp exact recovery threshold has been found in [1, 38] and it is further 
shown that semi-definite programming can achieve the sharp threshold in [22, 5]. More recently, 
exact recovery thresholds have been identified in a more general setting with a fixed number of 
clusters [23, 46], and with heterogeneous cluster sizes and edge probabilities [2, 41]. 

Real networks are often sparse with bounded average degrees. In the sparse setting with a = 
0(1), exact recovery of the clusters from the graph becomes hopeless as the resulting graph under 
the stochastic block model will have many isolated vertices. Moreover, it is easy to see that even 
vertices with constant degree cannot be labeled accurately given all the other vertices’ labels are 
revealed. Thus the goal in the sparse regime is to find a labeling that has a non-trivial or maximal 
correlation with the true one (up to permutation of cluster labels). It was conjectured in [17] and 
proven in [39, 37, 31] that nontrivial detection is feasible if and only if (a — 6)^ > 2(a -|- b). A 
spectral method based on the non-backtracking matrix is shown to achieve the sharp threshold in 
[9]. In contrast, a simple argument in [27] shows that no local algorithm running on neighborhoods 
of radius o(logn) can attain nontrivial detection. 

1.3 Community detection with side information 

The community detection problem under stochastic block model is an idealization of a network 
inference problem. In many realistic settings, in addition to network information, some partial in¬ 
formation about vertices’ labels is also available. There has been much recent work in the machine 
learning and applied networks communities on combining vertex and network information (see for 
example [II, 6, 7, 40]). In this paper, we ask the following natural but fundamental question: 

With the help of partial information about vertices ’ labels, can local algorithms achieve the op¬ 
timal detection probability? 

This question has two motivations: 1) from a theoretical perspective we would like to understand 
how side information affects the existence of optimal local algorithms; 2) from the application 
perspective, it is important to develop fast community detection algorithms which exploit side 
information in addition to the graph structure. 

There are two natural models for side information of community structures: 

• A model where a small random fraction of the vertices’ labels is given accurately. This model 
was considered in a number of recent works in physics and computer science [17, 45, 3, 27]. 
The emerging conjectured picture is that in the case of the binary symmetric stochastic block 
model, the local application of BP is able to achieve the optimal detection probability. This 
is stated formally as one of the main conjectures of [27], where it is proven in an asymptotic 
regime where the fraction of revealed information goes to 0 and assuming (a — b)‘^ > C{a -\- b) 
for some large constant C. 
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• The model considered in this paper is where noisy information is provided for each vertex. 
Specifically, for each vertex, we observe a noisy label which is the same as its true label 
with probability 1 — a and different with probability a, independently at random, for some 
a G [0,1/2). 

For this model, by assuming that a/b = 0(1) and the average degree (a -j- b)/2 = is 
smaller than all powers of n, we prove that local application of belief propagation maxi¬ 
mizes the expected fraction of vertices labelled correctly, i.e., achieving the optimal detection 
probability, in the following regimes 

— |o — 6| < 2, 

— (a — b)^ > C(a -h b) for some constant C, 

— a < a* for some constant 0 < a* < 1/2. 

Note that this proves the conjectured picture in a wide range of the parameters. In particular, 
compared to the results of [27], we prove the conjecture in the whole regime ((a — 6)^ > 
C{a + b)) X (a G (0,1/2)) while in [27] the result is only proven for the limiting interval of 
this region ((a — 6)^ > C{a-\-b)) x {a' —)• 0^), where each vertex’s true label is revealed with 
probability a'. 

In the large degree limit a ^ oo we further provide a simple formula of the expected fraction 
of vertices labeled correctly by BP, in terms of a fixed point of a recursion, based on the 
density evolution analysis. Density evolution has been used for the analysis of sparse graph 
codes [43, 33], and more recently for the analysis of finding a single community in a sparse 
graph [35]. 

2 Model and main results 

We next present a formal definition of the model followed by a formal statement of the main results. 

2.1 Model 

We consider the binary symmetric stochastic block model with two clusters. This is a random 
graph model on n vertices, where we first independently assign each vertex into one of the clusters 
uniformly at random, and then independently draw an edge between each pair of vertices with 
probability a/n if two vertices are in the same clusters or 6/n otherwise. Let at = + if vertex i is 
in the first cluster and CTi = — otherwise. 

Let G = Gn = {V, E) denote the observed graph (without the labels a). Let a be an a noisy 
version of a: for each vertex i independently, u* = Uj with probability 1 — a and a* = —Uj with 
probability a, where o. G [0,1/2) is a fixed constant. Hence, a can be viewed as the side information 
for the cluster structure. 

Definition 2.1. The detection problem with side information is the inference problem of inferring 
a from the observation of {G, a). The estimation accuracy for an estimator a is defined by 

1 

PgS^) == , ( 1 ) 

^ • 1 
2 = 1 

which equals to the expected fraction of vertices labeled correctly. Let p*Q denote the optimal esti¬ 
mation accuracy. 
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The optimal estimator in maximizing the success probability P {ai = ai} is the maximum a 
posterior (MAP) estimator, which is given by 2 x l{p{o-.=+|G,5}>P{(Ti=-|G,5}} “ 1) the maximum 
success probability is [jPlui = +\G,a} — F {ai = —|G,a} |] + Hence, the optimal estimation 
accuracy p*Q^ is given by 


1 , 1 

^ ^ ^ +|G', cr} - P{cTj = -|G, cr} |] + 2 

i=l 

= -E [|P {cTj = +|G,cr} - P{fJi = -\G,a} |] + -, (2) 

where the second equality holds due to the symmetry. However, computing the MAP estimator is 
computationally intractable in general, and it is unclear whether the optimal estimation accuracy 
Pq^ can be achieved by some estimator computable in polynomial-time. 

In this paper, we focus on the regime: 

- = 0(1), a = as n —)> oo, (3) 

It is well know that in the regime a = n°^^\ a local neighborhood of a vertex is with high 
probability a tree. Thus, it is natural to study the performance of local algorithms. We next 
present a formal definition of local algorithms which is a slight variant of the definition in [35] . 

Let denote the space of graphs with one distinguished vertex and labels -|- or — on each 
vertex. For an estimator a, it can be viewed as a function 5 : {±}, which maps {G,a,u) to 

Bu for every (G,a,u) G Q^. 

Definition 2.2. Given a f G N, an estimator a is t-local if there exist a function J- ^ {±} 
such that for all {G,a^u) G Q^, 

d{G,a,u) =F{Gi,aGt), 

where G\^ is the subgraph of G induced by vertices whose distance to u is at most t; the distinguished 
vertex is u, and each vertex i in has label ai; aQt is the restriction of a to vertices in 
Moreover, we call an estimator a local, if it is t-local for some fixed t, regardless of the graph size 

n. 


We can potentially allow local algorithms to access local independent uniform random variables 
as defined in [24, 20]. Since our main results show that the local BP algorithm which does not 
need to access external randomness, is already optimal, the extra randomness is not needed in our 
context. 

2.2 Local belief propagation algorithm 

It is well known that, see e.g. [35, Lemma 4.3], local belief propagation algorithm as defined in 
Algorithm 1 maximizes the estimation accuracy among local algorithms, provide that the graph 
is locally tree-like. Thus we focus on studying the local BP. Specifically, let di denote the set of 
neighbors of i and define 


RU, = h,+ Y. f{rY), 


( 4 ) 
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Algorithm 1 Local belief propagation with side information 

1 : Input: n € N, a > b > 0, a £ [0,1 /2), adjacency matrix A £ {0, and t £N. 

2 : Initialize: Set Ri^j = 0 for all i £ [n] and j £ di. 

3: Run t — 1 iterations of message passing as in (4) to compute R^I^j for all i £ [n] and j £ di. 
4: Compute R\ for all i £ [n] as per (5). 

5: Return Ugp with d^^{i) = 2 x “ 1- 


with initial conditions R^^j = 7 if r* = + and R^^j = —7 if Tj = —, for all i £ [n] and j £ di. 
Then we approximate ^ log by R\ given by 

Ri = K+Y.F{K\-_^J. (5) 

e&du 

We remark that in each BP iteration, the number of outgoing messages to compute is 0{\E\), where 
\E\ is the total number of edges; each outgoing message needs to process d! incoming messages on 
average, where d! is the average number of edges incident to an edge chosen uniformly at random. 
Thus each BP iteration runs in time 0{\E\d') and Ugp is computable in time 0{t\E\d'). In the 
sparse graph with a = 0(1), ?Bp runs in time linear in the size of the graphs. For graphs with 
maximal degree polylog(n), it runs in time n polylog(n). 

2.3 Main results 

Theorem 2.3. Consider the detection problem with side information assuming that a/b = 0(1) and 
that a = . Let Ugp denote the estimator given by Belief Propagation applied for t iterations, 

as defined in Algorithm 1. Then 

lim limsup {p*Q^ -Pg„(?bp)) = 0 

t^OO ^ 2—>-00 


in the following three regimes: 

• |a — 6| < 2, 

• (a — 6 )^ > C{a + b) for some constant C, 

• a < a* for some 0 < a* < 1 / 2 . 

In other words, in each of these regimes a local application of belief propagation provides an optimal 
detection probability. 

The above results should be contrasted with the case with no side information available, where 
it is known, see e.g. [27], that BP applied for t = o(logn) iterations cannot recover a partition 
better than random, i.e., achieving the non-trivial detection. 

In the large degree regime, we further derive an asymptotic formula for Pg„(^bp) ™ terms of a 
fixed point of a recursion. 

Theorem 2.4. Consider the regime (3). Assume further that as n ^ 00 , a —)• 00 and —)• 
/i, where p is a fixed constant. Let h{v) = E [tanh(u + yTZ + [/)], where Z ~ Af(0,1); U is 
independent of Z and U = with probability 1 — a and U = —7 with probability a, where 7 = 

Define v and v to be the smallest and largest fixed point of v = ^h{v), respectively. 
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let aBP denote the estimator given by Belief Propagation applied for t iterations, as defined in 
Algorithm 1. Then, 


lim lim PgA^bp) = 1 - E 

t^oo n^oo 


lim sup p*Q^ < 1 — E 

n^oo 


v + U 
v + U 

\/f) 


where Q{x) = Moreever, y = v andlimt^oolip^n^ooPG„i^Bp) 

in the following three regimes: 


• \h\ < 2 , 

• \fi\ > C for some constant C, 

• a < a* for some 0 < a* < 1/2. 


limsup„^ooPG„ 


2.4 Proof ideas 

The proof of Theorem 2.3 follows ideas from [39, 36]. 

• To bound from above the accuracy of an arbitrary estimator, we bound its accuracy for a 
specific random vertex u. Following [39], we consider an estimator, which in addition to the 
graph structure and the noisy labels, the exact labels of all vertices at distance exactly t from 
u is also given. As in [39], it is possible to show that the best estimator in this case is given 
by BP for t levels using the exact labels at distance t. 

• The only difference between our application of BP and the BP upper bound above is the 
quality of information at distance exactly t from vertex u. Our goal is to now analyze the 
recursion of random variables defining BP in both cases and show they converge to the same 
value given exact or noisy information at level t. 

• In the two cases where 1) (a — h)‘^>C{a + b) and 2) where a is small, our proof follows the 
pattern of [36] . We note however that the paper [36] did not consider side information and the 
adaptation of the proof is far from trivial. Similar to the setup in [36], the noisy labels at the 
boundary, i.e., level t, play the role as an initialization of the recursion. However, the noisy 
labels inside the tree results in less symmetric recursions that need to be controlled. Finally 
in the case where a is small they play a novel role as the reason behind the contraction of 
the recursion. 


• The case where a — b <2 corresponds to the uniqueness regime. Here the recursion converges 
to the same value if all the vertices at level t are + or all vertices at level t are —. This implies 
that it converges to the same value for all possible values at level t. 

The proof of Theorem 2.4 instead follows the idea of density evolution [43, 33], which was recently 
used for analyzing the problem of finding a single community in a sparse graph [35]. 

• The neighborhood of a vertex u is locally tree-like and thus the incoming messages to vertex 
u from its neighbors in BP iterations are independent. In the large degree limit, the sum of 
incoming messages is distributed as Gaussian conditional on its label. Moreover, its mean 
and variance admit a simple recursion over t, which converge to a fixed point as f —)> oo. 
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• As we pointed out earlier, the only difference between our application of BP and the BP 
upper bound discussed above is the quality of information at distance exactly t from vertex 
u. Hence, the mean and variance for both BPs satisfy the same recursion but with different 
initialization. If there is a unique fixed point of the recursion for mean and variance, then 
the mean and variance for both BPs converge to the same values as t —)• oo. 

• The case |/r| < 2 exactly corresponds to the regime below the Kesten-Stigum bound [28]. In 
this case, we can show that the recursion is a contraction mapping and thus has a unique 
fixed point. 

2.5 Conjectures and open problems 

There are many interesting conjectures and open problems resulting from this work. First, we 
believe that local BP with side information always achieves optimal estimation accuracy. 

Conjecture 2.5. Under the binary symmetric stochastic block model with a-noisy side information, 
limi^oohm„^ooPG„(?Bp) = 1™sup^^^^ Pcn holds for all a, b, and a. 

In the large degree regime with a — ?• oo, a = and — )• fj,, Theorem 2.4 implies that the 

above conjecture is true if u = /i^/i(u)/4 always has a unique fixed point. Through simulations, we 
find that v = /i^/i(u)/4 seems to have a unique fixed point for all ^ and a, and the asymptotically 
optimal estimation accuracy is depicted in Fig. 1. 



Figure 1; Numerical calculation of E 
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We are only able to show h{v) has a unique fixed point if |//| < 2., but we believe that it is true 
for all /i. 

Conjecture 2.6. For all |/r| > 2 and a € (0,1/2), v = y?h{v)/4: has a unique fixed point. 

It is tempting to prove Conjecture 2.6 by showing that h{v) is concave in v for all a G (0,1/2). 
However, through numerical experiments depicted in Fig. 2, we find that h(v) is convex around 
u = 0 when a < 0.1. 














Figure 2: Numerical calculation of h![v) (y axis) versus v G [0,10] (x axis) with different a. 

In this work, we assume that there is a noisy label for every vertex in the graph. Previous work 
[27] instead assumes that a fraction of vertices have true labels. However, in practice, it is neither 
easy to get noisy labels for every vertex or true labels. Thus, there arises an interesting question: 
are local algorithms still optimal with noisy labels available only for a small fraction of vertices? 

Moreover, we only studied the binary symmetric stochastic block model as a starting point. It 
would be of great interest to study to what extent our results generalize to the case with multiple 
clusters. Finally, the local algorithms are powerless in the symmetric stochastic block model simply 
because the local neighborhoods are statistically uncorrelated with the cluster structure. It is 
intriguing to investigate whether the local algorithms are optimal when the clusters are of different 
sizes or connectivities. 

2.6 Paper outline 

The rest of the paper is organized as follows. We introduce the related inference problems on 
Galton-Watson tree model in the next section, and show that the estimation accuracy of BP and the 
optimal estimation accuracy can be related to the estimation accuracy on the tree model. Section 
4 shows the optimality of BP in the uniqueness regime \a — h\ < 2. The proof of the optimality 
of BP in the high SNR regime (a — 5)^ > C{a + b) is presented in Section 5. Section 6 proves the 
optimality of BP when the side information is very accurate. Finally, section 7 characterizes the 
estimation accuracy of BP and the optimal estimation accuracy in the large degree regime based 
on density evolution analysis with Gaussian approximations. 

3 Inference problems on Galton-Watson tree model 

A key to understanding the inference problem on the graph is understanding the corresponding 
inference problem on Galton-Watson trees. We introduce the problem now. 

Definition 3.1. For a vertex u, we denote by (r„,r, r) the following Poisson two-type branching 
process tree rooted at u, where t is a F labeling of the vertices of T. Let is chosen from 
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{±} uniformly at random. Now recursively for each vertex i in T^, given its label Ti, i will have 
Li ~ Pois(a/2) children j with tj = +Ti and Mi ~ Pois(6/2) children j with tj = —Ti. Finally for 
eaeh vertex i, let Ti = r* with probability 1 — a and Ti = —Ti with probability a. 

It follows that the distribution of r conditional on r and a finite is given by 

P {r|r, Tu] oc exp I /3 ^ nTj + ^ hjTj 1 , 

\ * / 

where (d = \ log hi = jr^log and i ~ j means that i and j are connected in Tu. Observe 
that P{r|r, T^} is an Ising distribution on tree with external fields given by h. 

Let T/ denote the subtree of rooted at vertex i of depth t, and (9T* denote the set of vertices 
at the boundary of T*. With a bit abuse of notation, let ta denote the vector consisting of labels 
of vertices in A, where A could be either a set of vertices or a subgraph in T^. Similarly we define 
TA. We first consider the problem of estimating Tu given observation of Tf, TQ^t, and r-rt. Notice 
that the true labels of vertices in are not observed in this case. 

Definition 3.2. The detection problem with side information in the tree and exact information at 
the boundary of the tree is the inferenee problem of inferring Tu from the observation ofT^, tqt^, 
andr^t. The suecess probability for an estimator Tu{Tf,TQT^,TTT) is defined by 

Pt^{tu) = = 0\tu = 0 } + = l | r „ = 1 } . 

Let ptpt denote the optimal success probability. 

It is well-known that the optimal estimator in maximizing pip, is the maximum a posterior 
(MAP) estimator. Since the prior distribution of Tu is uniform over {±}, the MAP estimator is the 
same as the maximum likelihood (ML) estimator, which can be expressed in terms of log likelihood 
ratio; 


TmL = 2 X 1{AJ,>0} - 1) 


where 


A* ^ 



■> '^dTf ) ~ 

^ {tIN dThMllTi = -} 


for all i in Tu. Moreover, the optimal success probability pift is given by 

p-p = Ie [I.Yil] + I, 

where A* is known as magnetization given by 


( 6 ) 


A* — P |Tj — +\Tf ,TQrpt,Tj^t'^ — pjri — —\T-,Tgrpt,Tj^t'^ 

for all i in Tu. In view of the identity tanh“^(x) = ^ log ^ have that 

tanh“^(Al) = A*. 

We then consider the problem of estimating Tu given observation of T* and T^t. Notice that in 
this case the true labels of vertices in T* are not observed. 


10 



Definition 3.3. The detection problem with side information in the tree is the inference problem of 
inferring Tu from the observation ofT^ and Tj't. The suecess probability for an estimator fu{T^, ) 
is defined by 

grti^u) = = 0 |r„ = 0 } + = l|r„ = 1 } . 

Let gift denote the optimal success probability. 

We remark that the only difference between Definition 3.2 and Definition 3.3 is that the exact 
labels at the boundary of the tree is revealed to estimators in the former and hidden in the latter. 
The optimal estimator in maximizing qrpt can be also expressed in terms of log likelihood ratio; 


tml = 2 X - 1) 


where 


1 


'{n, 


log — 

2 


Tt*. \ Ti 


Tt*- \ Ti 



for all i in r„. Moreover, the optimal success probability gift is given by 

9f. = jE [|r„'l] + i, 

where magnetization is given by 


Yl ^ P{r. = +|r/,Trt} -Pjri = -\TITti} 


(7) 


for all i in T^. Again we have that tanh“^(lt*) = r*. 

The log likelihood ratios and magnetizations can be computed via the belief propagation al¬ 
gorithm. The following lemma gives recursive formula to compute A* (T*) and Xj (Y^): no ap¬ 
proximations are needed. Notice that the A* and T* satisfy the same recursion but with different 
initialization; similarly for A* and Y^. Let di denote the set of children of vertex i. 

Lemma 3.4. Define F{x) = tanh“^ (tanh(/3) tanh(x)). Then for t>l, 


A‘ = ft. + F(A;-‘) (8) 

iedi 

r* = h, +(9) 
e&di 


where = co if Ti = + and A° = —oo if Ti = —; T? = 7 i/ r* = -|- and T? = —7 if Ti = —. It 
follows that for t>l. 


where A? = 1 if n 


F OX* ^) + - OXl 

- e-^- ntea.(l - OYt") 

* n.ea.(l ” GYt") ’ 

-|- and A? = —1 if n = Y^ = 1 — 2a if n = + and Y^ = 2 a — 1 if n 


( 10 ) 

( 11 ) 
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Proof. By definition, the claims for A? (F?) and Xf (Y^) hold. We prove the claims for with 
t > 1 ; the claims for A* for i 7 ^ tt and F* follow similarly. 

A key point is to use the independent splitting property of the Poisson distribution to give 
an equivalent description of the numbers of children with each label for any vertex in the tree. 
Instead of separately generating the number of children of with each label, we can first generate 
the total number of children and then independently and randomly label each child. Specifically, 
for every vertex i in T„, let W denote the total number of its children and assume W ~ Pois(d) 
with d = (a + 6)/2. For each child j G di, independently of everything else, tj = Ti with probability 
a/{a + b) and tj = —Ti with probability b/{a + b). With this view, the observation of tree Tf 
itself gives no information on the label of u. The side information and then the conditionally 
independent messages from those children provide information. To be precise, we have that 


1 ^{Tu\Tu 

log-1- 


+ } Y[i(^du^ {^i ^,TQrpt-l,Tj.t-l\Tu — +1 


-) n« 




— 1 —1 


T; 


i ^ i 


1 ^. 


,Ti = x\Tu = +1 


i&du Tltx 


x\Tu 


} 


Ti = x|r, 


u 


■hu + Tl^\og - 

i&du Exe{±}lP’{'F = 


+ } P |r/ ^,TQrpt-l , \Ti = x| 


. iv-, “•’{Tt 

l^OU 




-}p|Tf ^ 

+ } + b'P jp/ ^,TQrpt-l 

i i ) 


T; 


i 


■ +aP|T*“\ 


i i ) 


g2/3+2A*-i 


where the first equality holds because T* is independent of Tu, and conditional on Tu, Tu and 
(rg^t-i, for all i G du are independent; the second equality holds due to the fact that 

Tu = Tu with probability 1 — a and Tu = —Tu with probability a; the third equality follows because 
conditional on Ti, Tu is independent of {Tgj,t-i,Trpt-i); the fourth equality holds because n = Tu 
with probability a/{a + b) and Ti = —Tu with probability b/{a + 6 ); the last equality follows from 
the definition of j3 and A*. By the definition of tanh(x) and the fact that tanh“^(a:) = ^ log 
for X G [— 1 , 1 ], it follows that 


A 


t 

U 


, 1 1 1 + tanh(/3) tanh(A* 

hu + -y log- — - \ / 

idu 1 - tanh(/3) tanh(A. 

hu + tanh“^ (tanh(/3) tanh(A*“^)) . 
i<^du 
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Finally, notice that = tanh(A^) and thus 


X* = tanh ihu+ ^ tanh ^ (tanh(/3)X* j 

V iGdu J 


= tanh [K+^y] 

V "l-tanh(/3)xr 


n^e^ui^ + 0Xl 


— tanh(/3)X* 

-e 


n.6a«(i+n.6aJi - 


□ 


As a corollary of Lemma 3.4, A„ is monotone with respect to the boundary conditions. For any 
vertex z in Tij, let 


AKO = ^log 


' — +} 

'{Tl,TQTt = ^,TTt\Ti = 


where ^ G {±}I^^'^L 


Corollary 3.5. Fix any vertex i in T„ and t > 1. If the boundary conditions ^ and f are such that 
ft > fe for all £ G dTf, then A*(^) > A*(^) for a > b and A*(^) < A*(^) otherwise. In particular, 
A.(^) is maximized for a > b and minimized for a < b, when f£ = + for all £ G dTj. 


Proof. Recall that F{x) = tanh“^ (tanh(/3) tanh(x)). Then F’(±oo) = ±/3. We prove the corollary 
by induction. Suppose that a> b and thus /3 > 0. We first check the base case t = 1. By definition, 
A°(+) = oo and A°(—) = —oo. In view of ( 8 ), 


aK0-aK0 = E[^ 


tGdi 


-F(A0(^,))j =2/3^ 


l&di 




Suppose the claim holds for t > 1; we need to prove that the claim holds for t + 1. In particular, 
in view of ( 8 ), 


A‘+1(0 - A‘+I(f) = ^ [f (A'MiiTi)) - F (aK&t;)) 

ladi 


Notice that 

! tanh(/3)(l — tanh^(x)) 

A (x) =--oww- 

1 — tanh^(/3) tanh^(x) 

Since /3 > 0 and thus 0 < tanh(/3) < 1, it follows that 0 < F'{x) < tanh(/3). By the induction 
hypothesis, A^(^) > A\{f). Hence, A*’''^(^) > A*’''^(|) and the corollary follows in case a > b. The 
proof for a < 6 is the same except that /3 < 0; thus Al(,f) < Al(^) and — tanh(/3) < F'(x) <0. □ 
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3.1 Connection between the graph problem and tree problems 

For the detection problem on graph, recall that PCn^Bp) denote the estimation accuracy of 
as per (1); is the optimal estimation accuracy. For the detection problems on tree, recall that 
p*rpt is the optimal estimation accuracy of estimating based on T^, r-pt, and as per ( 6 ); 
is the optimal estimation accuracy of estimating Tu based on and as per (7). In this section, 
we show that pc^ (^bp ) equals to q^t asymptotically, and p*Q is bounded by from above for 
any t > 1. Notice that the dependency of q^t and p^t on n is only through the dependency of a 
and b on n. Hence, if a and b are fixed constants, then both q^t and p^t do not depend on n. 

A key ingredient is to show G is locally tree-like in the regime a = Let denote the 

subgraph of G induced by vertices whose distance to u is at most t and let denote the set of 
vertices whose distance from u is precisely t. With a bit abuse of notation, let a a denote the vector 
consisting of labels of vertices in A, where A could be either a set of vertices or a subgraph in G. 
Similarly we define a a- The following lemma proved in [39] shows that we can construct a coupling 
such that {G^,aQt^aQt) = (T*,rpt,rpt) with probability converging to 1 when a* = n°T). 

Lemma 3.6. For t = t{n) such that a* = n°T)^ there exists a coupling between {G,a,a) and 
(T, T, r) such that {G^,act,aQt) = (T*, T'pt, rpt) with probability converging to 1. 

In the following, for ease of notation, we write T* as T* and G\^ as G* when there is no ambiguity. 
Suppose that {G^,aQt,aQt) = (T*,rpt,r-pt), then by comparing BP iterations (4) and (5) with the 
recursions of log likelihood ratio F* (9), we find that exactly equals to F^. In other words, when 
local neighborhood of n is a tree, the BP algorithm defined in Algorithm 1 exactly computes the 
log likelihood ratio P^ for the tree model. Building upon this intuition, the following lemma shows 
that Pg„(S^bp) equals to q^t asymptotically. 

Lemma 3.7. For t = t{n) such that = n°^^\ 

lim |PG„(?BP) -qTt\= 0 - 

n^oo 

Proof. In view of Lemma 3.6, we can construct a coupling such that {G^^aQt^aQt) = (T^, ,TTt) 
with probability converging to I. On the event (G*, act , aQt ) = (T*, rpt, ppt), we have that = F^. 
Hence, 


PGni'^Bp) = dTt + o(l), (12) 

where o(l) term comes from the coupling error. □ 

We are going to show that as n —)■ oo, p^ is bounded by p^t from above for any t > 1. 
Before that, we need a key lemma which shows that conditional on {G^, aQt, age*), au is almost 
independent of the graph structure and noisy labels outside of GL 

Lemma 3.8. For t = t{n) such that of = there exists a sequence of events En such that 

P {En} —^1 as n ^ Qo, and on event En, 

¥ {au = x\G^,aGt,agGt] = {1 +o{l))¥{au = x\G,a,agct} , Vx £ {±}. (13) 

Moreover, on event En, {G^, act, act) = ,TTt,Txt) holds. 
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Proof. Recall that is the subgraph of G induced by vertices whose distance from u is at most t. 
Let denote the set of vertices in G^~^, denote the set of vertices in G*, and C* denote the set 
of vertices in G but not in G*. Then U dG^ = and A^ U dG^ D G^ = V. Define sa = 
and Sc = Let 

£n = {(fTc,G‘) : |sc| < n°-®, \B\ < n°-\ {G\ act, act) = (T*, r^t, tt*)}. 


Next, we show P{Tn} —)• 1. 

By the assumption a* = n°^^\ it follows that (G^, act,act) = (T*, r^t, r-rt) and \B\ = with 
probability converging to 1 (see [39, Proposition 4.2] for a formal proof). Note that sc = 2X — |G| 
for some X ~ Binom(|G|, 1/2). Letting Un = nP'^, in view of the Bernstein inequality, 

— a^/4 

P{|sc| >«n} =P{|X-|C|/ 2 | >a„/ 2 } < 2 e“ ■|C|/2+q„/3 = 0 ( 1 ), 

where the last equality holds because |G| < n and anf\/n —>■ oo. In conclusion, we have that 
F {Sn} —)> 1 as n —)• oo. 

Finally, we prove that (13) holds. We start by proving that on event Sn, 

P{cT„ = x,G\aGt,agct} = (1 + o(l))iL P{fT„ = x,G\act,aQct,(^c} , Vx G {±}. (14) 

for some K which does not depend on x. To proceed, we write the joint distribution of a, G, and 
u as a product form. 

For any two sets Ui,U 2 C V, define 


^Ui,U2{G,a)= (f)uviG,a), 

(u,v)&UixU2 


where {u, v) denotes an unordered pair of vertices and 

(puv{G,L,a) = < 


tXu) 

Then the joint distribution of a, G, and a can be written as 


a/n 

if 

— 

1 

{u, 

v) 

G 

E 

b/n 

if 


1 

{u, 

v) 

G 

E 

— ajn 

if 

— 

1 

{u, 

v) 


E 

— b/n 

if 


1 

{u, 

v) 


E 

J 4^u{p^U'i 

;), where 





f 1- 

a 

if CFu 

= a 

U 




1 

a 

if '^u 


U 





F {a,G,a] = 2 ‘he ^c,c ^dG\c ^A,c- 

Notice that A and G are disconnected. We claim that on event £n, 4 a,c only depend on a through 
the o(l) term. In particular, on event 

^A,c{G,a)= (j)uv{G,a) 

('R,tf)GAxC 

_ ayi^llci+^Ascl/a ^y|A||c|-sA^c)/2 


. , / a\mc\/2 f 

4(l + o(l))iF(|A|,|G|), 


l^l|C|/2 

n J 
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where ir( 1^41, IC'D only depends on |A| and |C|; the second equality holds because u £ A and v £ C 
implies that {u,v) ^ E and thus (puv is either 1 — a/n or 1 — 6/re; the third equality holds because 
|sASc| < l- 6 ||'Sc| < = o{n). As a consequence, 

P{a,C,5, £n] = (1 + o{l))2-^ K{\A\,\C\) <^c <^b,b ^c,c ^aG£c, 

It follows that 

F{au = x,G\aGt,a9Gt,ac,£n} = (1 + o(l))2-" i^(|A|, |C|) ^ ^b,b = {l + o{l))K', 

<^A\{u} 

where K' does not depend on ac- Hence, 

P{cr„ = X,G^,aGt,(XdG£^n] = ^P{cr« = X,G\aGt,CFdGG(^C,£n] 

= (1 + o(l))i^' l{|.cl<no n = (1 + o{l))K'2\^\F {|sc| < n°- 6 } , 

o-C 

and the desired (14) follows with K = 2l^lp {|sc| < nP'^} ■ By Bayes’ rule, it follows from (14) that 
on event £n, 

P [uu = x\G\aGt,(TQGt} = (1 + o(l))P{cT„ = x\G\aGt,(jQGt,CFG] , Vx G {±}. 

Hence, on event En, 

P{cr„ = x\G,a,a 9 Gt} = ^P{<t„ = x,ac\G,a,aQGt} 

= {aG\G,a,a9Gt}F {au = x\G,a,a9Gt,(JG} 

= Wc\G,a,a9Gt}F {au = x\G\aGt,a9Gt,crc} 

= (1 + o(l))P{(T„ = x\G\aGt,(ydG^] > 

where the third equality holds, because conditional on {G^,aGt,crQGi, crc), cTu is independent of the 
graph structure and noisy labels outside of G*; the last equality follows due to the last displayed 
equation. Hence, on the event £n, the desired (13) holds. 

□ 

Lemma 3.9. For t = t{n) such that aP = rfP\ 

limsup(pG„ -PrO < 0 - 

71^00 

Proof. In view of (2), 

p*G^ = [|PK = +|G,5} -PK = -IG,?} |] + 2 - 

Consider estimating au based on G and a. For a fixed t £ N, suppose a genie reveals the labels of 
all vertices whose distance from u is precisely t, and let ^Oracle,* denote the optimal oracle estimator 
given by 


0-Oracle,t(w) - 2 X l{p{CT„=+|G,5,CTggt}>P{t7„=-|G,5,<TgQt}} 1- 
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Let PGn{^OTa.cie,t) denote the success probability of the oracle estimator, which is given by 

PG„(CToracie,t) = = +\G,a,aQct} - P {cT„ = -\G,a,aQQt} |] + 

Since dor&cie,t{u) is optimal with the extra information crgc't, it follows that PG„(S^Oracie,t) > P*g^ 
for all t and n. Lemma 3.8 implies that there exists a sequence of events £n such that P{<Sn 1} 
and on event Sn, 

P {au = x\G\act,agct} = (1 + o(l))P{cr„ = x\G,a,aQGt} , Vx G {±}, 
and {G*' , act,^G^) = {T^,TTt,Txt) holds. It follows that 

PG„(?Oracie,t) = [|P{o-n = +\G , a qg*} - ^ {(T u = -\G,a,aQGt} |l{£:„}] + ^ + o(l) 

= -E [|p [uu = +|G'^0■GGC^aGt} -P {(Ju = “IG*, 0-gG craG^} |l{£n}] + 2 

= ^E [|P{r„ = -P{r„ = -|T*,r^t,rar*} |l{£-„}] + ^ + o(l) 

= ^E [|P{r„ = +|^^r^t,ra^^} -P{t„ = -|T^r^t,rar*} |] + ^ +o(l) 

= p*j,t +0(1). 

Hence, 


PG„ - Ph ^ PG„(?Oracle,t) - P^t ^ 0. 


□ 


The following is a simple corollary of Lemma 3.7 and Lemma 3.9. 

Corollary 3.10. For t = t{n) such that a!" = 

lim sup (pg„ - PG„ (^BP)) < lim sup {p^t - q^t) ■ 

n^oQ n^oQ 

The above corollary implies that Ugp achieves the optimal estimation accuracy p*G^ asymptoti¬ 
cally, provided that converges to 0, or equivalently, E [|X* — Tj|] converges to 0. Notice 

that the only difference between p’^t and q^t is that in the former, exact label information is revealed 
at the boundary of T*, while in the latter, only noisy label information at level t is available. In 
the next three sections, we provide three different sufficient conditions under which E [|X* — 1^|] 
converges to 0. 

4 Optimality of local BP when |a — 6| < 2 

Recall that is a function of i.e., the labels of vertices at the boundary of T*. Let A^(-|-) 
denote A^ with the labels of vertices at the boundary of T* all equal to -|-. Similarly define A^(—). 
The following lemma shows that A^ is asymptotically independent of xgj’t as f —>■ oo if |a — 6| <2. 

Lemma 4.1. For all t >0, let e{t -|- 1) = E [|A^+^(-|-) — A^+^(—)|]. Then 

e{t -|- 1) < e(l) (I tanh(/3)|d)*, 

where e(l) = 2/3d and d = {a + b)/2. 
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Proof. Recall that \F'{x)\ < |tanh(/l)|. It follows that 

|Ai+'(+) - A,‘+'(-)| < E I (^(A‘(+)) - A-lASl-l))) I < I taiih(/3)| E I (A‘(+) - A‘(-l)) |. 

iGdu i£du 

Notice that the subtree T/ has the same distribution as T*. Thus A* has the same distribution as 
A^. Moreover, the number of children of root u is independent of A* for all i G du. Thus, taking 
the expectation over both hand sides of the last displayed equation, we get that 

e{t + 1) = E [|A^+^(+) - A*J-^(-)|] < I tanh(/3)|d E [|A^(+) - A^(-)|] = | tanh(/3)|d e{t). 

Moreover, 

e(l) = d {F{oo) — F{—oo)) = 2f3d. 


□ 


Theorem 4.2. If |a — 6 | < 2, then limt_>.oo limsup^^^^ E [|Ai* — T^|] = 0. 
Proof. In view of Lemma 4.1, we have that 


E [\Ai{+) - Aii-)\] < 2/3d(tanh(/l)d)*-i 



a + b 
2 



t-i 


where we used the fact that d = {a + b)/2 and (3 = log(a/6)/2. It follows from Corollary 3.5 and 
the facts that X* = tanh(A^) and = tanh(r^) that 

\K - Yi\ < |A^, - < |Ai(+) - Aj,(-)|. 


Combing the last two displayed equations gives 


E[|x*-y„*|] < 


log 


a + b (\a — b\ 


t-i 


By the assumption, a—b < 2 for all sufficiently large n, and thus lim 4 _>.oo limsup„_,.o^ E [|X^ — T^l] = 

0 . □ 


5 Optimality of local BP when (a — 6)^ > C{a + b) 


Recall that d = {a+b)/2 and (3 = ^ log p We introduce the notation 6 = tanh(/3) and r] = {l — 9)/2. 
Let E+ [X] and E“ [X] denote the expectation of X conditional on = + and Tu = —, respectively. 


Theorem 5.1. There exists a constant C depending only on cq such that if (a — 6)^ > C{a + b), 
then limi^oo limsup^^go E [|X* - Yf^\] = 0. 


The proof of Theorem 5.1 is divided into three steps. Firstly, we show that when 6‘^d = 
is large, then E+ [X*] and E+ [yf] are close to 1 for all sufficiently large t. This result allows us 
to analyze the recursions (10) and (11) by assuming E+ [X*] and E+ [X^] are close to 1. Secondly, 
we study the recursions of E [(X* — 1^)^] when |0| is small. Finally, we analyze the recursions of 


E Y^|X* — Y^\ when |0| is large. The partition of analysis of recursions into small \0\ and large 

\6\ cases, and the study of different moments of |X^ — Xjl, are related to the fact that for different 
values of 9 we expect the distributions of X* and Yf^ correspond to different power-laws. When 9 is 
small, we have many small contributions from neighbors and therefore it is expected that X* and 
y(| will have thin tails. When 9 is large, we have a few large contributions from neighbors and we 
therefore expect fat tails. 
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5.1 Large expected magnetization 

We first introduce a useful lemma to relate E"*" [W*] with E [|W* |]. 

Lemma 5.2. 

E+ [Xi] > 2E [\Xi\] - 1, 

E+ [Y^] > 2E [|i;*|] - 1. (15) 

Proof. We prove the claim for W*; the claim for Yf follows analogously. Observe that 

E+ [X*] = E+ [Xil^xim] K1{X‘<0}] 

= E+ [|X*|l|^*>o}]-E+ [\Xi\l^xi<o}] 

= E+ [|X*|] -2E+ [|W*|l{^.<o}] 

>E+ [|W*|] -2E{W* <0|r„ = +}, 

= E[|X*|] -2E{X* <0|t„ = +} (16) 

where the last inequality holds because | < 1; the last equality holds due to E"*" [|X* |] = E [|-^^|], 
Moreover, by dehnition, 

> 0|w = +} + l^ixi < 0 \tu = -} < > 0|r„ = +} , 

where the last inequality holds because by symmetry, < 0|r„ = —} = > 0|r.u = +|. 

Thus P {X* < 0\tu = +} < ^ and it follows from (16) that 

E+[X*] > E [|X‘|] - (1 - E [|X‘|]) = 2E [|X*|] - 1. (17) 

□ 


Lemma 5.3. 


There is a universal constant C" > 0 and t*{9,d,a) such that for all t > t*{9,d,a), 


E+ [Xi] > 1 
E+ [Yi] > 1 


2C'ri 

e^d 

2C'rj 

~Wd 


- 2 e 


-C'd 


- 2 e 


-C'd 


The proof of the lemma follows quite easily from a similar statement in [36] where the authors 
considered models where less information is provided - i.e., there is only information on the leaves 
of the trees (which are either noisy or not). 


Proof. Notice that given access to Tgrt and r^t, the optimal estimator of is sgn(X^) whose 
success probability is (1 + E [|X*|])/2. Define Xi = E{t„ = +|Tar^} — E{tu = —[Tg^n} . It is 
shown in [36] that there exists a universal constant C' > 0 and a t*{9, d) such that for t >t*, 


E 



> 1 - 


C'r] 

Wd 


— e 


-C'd 


Consider the estimator sgn(X*), whose success probability is given by 
sgn(X*) is the optimal estimator, it follows that 


(l + E 



E[|X*|] >E 


IX! 


> 1 - 


C't] 

Wd 


— e 


-C'd 


j2. Since 
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Hence, by Lemma 5.2, 


E [Xi] > 2E [|X* I] - 1 > 1 - ^ (18) 

The claim for E"*" \Y^] can be proved similarly except that we need to assume t > t*{9, d, a) for some 
t*{9, d, a) . 

□ 


5.2 Small l^l regime 

In this subsection, we focus on the regime |0| < 0*, where 0* is a small constant to be specified 
later. In this regime, to prove Theorem 5.1, it is sufficient to show the following proposition. 

Proposition 5.4. There exist universal constants 6* > 0 and C > 0 such that if \0\ < 6* and 
(a — b)'^ > C{a + b), then for all t > t*{6, d, a), 

E [{Xi+^ - < v/«(l - «)E [{Xi - Y^f] . 

Proof. Note that E"'' [(X* — = E“ [(X* — = E [{X^ — Yf)‘^~\. Hence, it suffices to show 

that 


E+ [{Xi+^ - Y^+^f] < - a)E+ [(X* - Y^f] . (19) 

Fix t*{9,d,a) as per se Lemma 5.3 and consider t > t*. Define / = e^“ + ^Xj) and 

g = e“^“ “ ^Xf), and f' and g' are the corresponding quantities with X replacing by Y. 

Then it follows from the recursion (10) and (11) that 

f-g f-9'\ 1 1 


I = 

\ -^U ^ u \ 


rt+l\ 


f + 9 f +9' 


= 2 


1 


1 + g/f l + g'/f 


< 


7 ) 


/ \ V4 


( 20 ) 


where in the last inequality, we apply the following inequality with s = 1/4: 

1(1 + x)~^ - (1 + y)~^\ < - ?/*|, 

/ ^_ 0^^ \ 7/4 

which holds for all 0 < s < 1 and x,y > 0. Let Ai = ( ) and Bi = 

i G du. Then, it follows from (20) that 


- {h^)' 


E+ [(X‘+^ - Y^+^f\du] < 64E+ 

= 128^0(1 -a)E+ 


/ 

\ 2 


Q-hu / 

-n«< 

du 

\i£du 

i£du / 



n A. - n B. 

Kiadu iadu 


du 


( 21 ) 


where the last equality holds because hu is independent of {Ai, Biji^gu conditional on and du, 
and E"*" [e“^“] = a{l — a). It is shown in [36, Lemma 3.10] than 


E+ 


Kiddu i£du 


du 


<^(^)m^“2(E+ [A^ - B"^])^ + Dm^-^E+ [{A - Bf] , (22) 


where D = \du\, {A, B) has the same distribution as {Ai, Bi) for i £ du and m = maxlE"*" [H^] , E^ [H^] }. 
We further upper bound the right hand side of (22) by bounding E"*" [H^], E"*" [H^] , and their dif¬ 
ferences. 
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Lemma 5.5. There is a 0\> Q such that if |0| < 61 and O^d > C for a sufficiently large universal 
eonstant C, then m < 1 — 0^/4. 

Proof. Note that (1 — x + 5x^/8)^(l + x) = \ — x + x^/4 + o(x^) when x —)> 0. Thus, there exists 
a universal constant 6^ > 0 such that if |x| < 6^, then < 1 — x + 5x^/8. It follows that for 

6 <6\ and i G (9u, 


E+ [A]] = E+ 


A-ex^i 

1 + ext 


502 


<1-0E+ [X*]+—E+ [(X*)'] 

<1-0E+ [X*]+^, 

where in the last inequality, we used the fact that |Xl| < 1. By definition, 

E+ [X*] = E [xI\tu = +]=(!- r?)E [X^n = +] + r?E [Xl\n = -]=(!- 2r?)E+ [X*] , 


where the last equality holds because the distribution of —X* conditional on = — is the same as 
the distribution of X* conditional on r* = +, and both of them are the same as the distribution of 
X* conditional on = +. Hence, 


E+ [A^] <1-02e+ [X*]+—. 

In view of Lemma 5.3 and the assumption that 9‘^d > C for a sufficiently large universal constant 
C, E"*" [X*] > 7/8. Thus E+ < 1 — 9‘^/A. Similarly, we can show E+ [H/] < 1 — 0^/4 and the 
lemma follows. □ 


The following lemma bounds E"*" from the above. It is proved in [36, Lemma 3.13] 

and we provide a proof below for completeness. 

Lemma 5.6. There is a universal constant 02 > 0 such that for all |0| < 02 , 

E+ [7l2 - ^2] < 302 v^E+ [(X* -y/)2]. 

Proof. For i € du, the distribution of Ai conditional on r* = + is equal to the distribution of A~^ 
conditional on r* = —. Hence, 


Note that 


E+ [412] ^ = +] + r/E [4l2|r, = -] 

= E [(1 - 77)^2 r]Af‘^\Ti = +] . 


(1 - r])Ai + 7]A. ={1-1]) 


i-0x/y/^ /i+0x/y/^ 


i + 0 x*y' 
(l-77)(l-0X/)+r7(l + 0X/) 
y/{l-9Xj){l + 9Xt) 

1 - 02xf 




(23) 


( 24 ) 
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where we used the equality that 1 — 2r} = 6. Let f{x) = . One can check that there exists 

a universal constant 62 > 0 such that if |0| < 62 , fix) < 30^ for all x G [—1,1]. Therefore, 

\f{Xf-fiYf\<3e^\Xi-Yf. 

Combing the last displayed equation with (23) and (24) yields that 

E+ [Af - Bf] < [\Xl - Yf In = +] < 302^E[(X*-y/)2 |Ti = +]. 

Finally, in view of the fact that E [(X* — Yf^fi = +] = E+ [(X^ — Tj)^] and E+ [Af — Bf] = 
E+ [A^ — B‘^~\ , the lemma follows. □ 

Lemma 5.7. There is a universal constant @3 > 0 such that for all |0| < 9^, 

E+ [(^ - B)^] < 9^E+ [{Xi - y„*)2] . 

Proof. Let fix) = for x G [—1,1]. Then 

1^1 < 1^1 


l/'(^)l = 


-1 < X < 1. 


2(1 + 0x)®/^(l — 0x)^/^ 2(1 —|0|)2’ 

Hence, there exists a universal constant @3 > 0 such that for all \9\ < @ 3 , |/'(x)| < |0| for all 
X G [—1,1]. It follows that for i G du, (Hj — Bif < 0^ (X* — y/)^ . Therefore, 


E+ [iA - H)2] = E+ [iAi - Hi)2] < 02^+ ixl - Y^y 


By definition, 
E+ 


(X*-y/)' =(l-r?)E ixi-Yl)yn = + 


^^2 


+ rjE 


ix^ - Yyf \Ti = - 


= E 


ixl-Ylf\n = + 


= E+ 




iK-Yf) 


The lemma follows by combining the last two displayed eqnations. 


□ 


Finally, we finish the proof of Proposition 5.4. Let 6* = min{ 03 , 02 , 03 }. It follows from 
Lemma 5.5 that m < 1 — ^ Assembling (22), Lemma 5.6, and Lemma 5.7 yields that 


E+ 


\i^du i£du 


du 


< ^^£)2^4g-02(i?_2)/4 ^ ^^2g-02(D-l)/4^ 

<c'e-^^^/8E+[(A‘-yJ)2], (25) 


for some universal constant d, where the last inequality holds because Dd^e d’^DjiC) jg bounded from 
above by some universal constant. Since D Pois((i), it follows that E [e*^] = exp((i(e^ — 1)). 

Q-^^Djs _ exp f(i(e“®^/® — 1)^. Since e“^ < 1 — x/2 for x G [0,1/2], we have that 


Thus E 

- 6 » 2 _D /8 


E 


< exp(—d02/16). Hence 
E+ 


J| Aj WBi 

iGdu iGdu 


< 
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Therefore, there exists a universal constant C > 0 such that if dO^ > C, then 


E+ 



1 

< - 

“ 128 


E+ [{Xi 



Combing the last displayed equation together with (21) yields the desired (19) and the proposition 
follows. □ 


5.3 Large |6'| regime 

In this subsection, we focus on the regime where \0\ >0*. In this regime, to prove Theorem 5.1, it 
is sufficient to prove the following proposition. 

Proposition 5.8. Assume 1/cq < a/h < cq for some positive constant cq. For any 6* G (0,1), 
there exist a d* = d*{6*, cq) such that for all 9 >9*, d> d*, and t > t*{9, d, a), 


E 




Proof Note that E y/\Xf - Y^\ 
show that 


= E+ 


v\n-Yf\ 


= E- 


v\yi-n\ 


Thus it suffices to 


E+ 


Xi+^ - Yl+^\ < y/ail-a)E+ ^/\Xfy^\ 


(26) 


Fix t*{9, d, a) as per se Lemma 5.3 and consider t > t*. Let gi = Wi^Q^i^+dxi), g 2 = 
and g{{xi]i^au) = Then Xi+^ = g{{Xj}i^0u) and Yf+^ = (/({T/jiga^). Note that 

It = lilt is = -T^- It follows that for any i G du, 


dg 

dxi 


46*5152 


< 


46*52 


(1 — 9‘^xf){gie’^'^ + 52e“*^“)2 




= e 




46* 


n 


1 — 9xi 


(1 + 6*Xi)2 ^ 1 + 9xj' 


where the inequality holds due to 52 > 0 , and the last equality holds by the definition of 51 and 52 . 
By assumption, T < ^ < cq and thus |0| < Since (1 + 9xi) > 1 — |0| > I®’- 1^*1 — I’ ^ 

follows that for any i G du, 

(x) < (co + l)2e“2*»- - (co + l)^e“2^“ri(x). 


dx 


jedu-.j^i 


1 + 9xj 


Note that rj(x) is convex in x and thus for any x,y G [—1, l]l^“l and any 0 < 5 < 1, 

^{dx + (1 - d)y) < (co + l)^e“^^“ri(5x + (1 - 6)y) < (cq + l)^e“^''“ max{ri(x), ri{y)}. 

UXi 

It follows that 


\g{x) -g{y)\ < (co + ife ^ \xi - yi\ max{ri(x),ri(5)}. 

i^du 
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Hence 


Xi+^ - < (co + Y1 max{r.(X),r,(y)}. 

i£du 


Note that ri{X),ri{Y) are functions of {Xj,Yj} for all j G 5n\{z}, and thus ri(X),ri(Y) are 
independent of X^ and conditional on Tu and du. Moreover, hu is independent of {Xj^Y^^i^Qu 
conditional on and du. Thus it follows from the last displayed equation that 


E+ 


^J\xi+^-Y^+^\du 


< (co + 1)E+ 


hi, 




iGdu 

= 2(co + - a) ^ E+ 


iGdu 


XJ - Yl 


x! - Y} 


E+ max{ ^Jri{X), ^/rj(y)} 
E+ max{v^ri(X), y/ri{Y)} 


= 2(co + l)v'a(l-a) D E+ \^\Xi-Yi\\ E+ fmax{y;^, 


(27) 


where D = \du\-, {r{X),r{Y)) has the same distribution as {ri{X),ri{Y)) for i G du; the first 
equality follows due to E+ [e“^“] = 2^y a{\ — a); the last equality holds because for all i G du, 


E+ 


sIm - y;\ 


= (1 — r/)E 
= E 


\Jm-Y‘\\n = + 

= E+ 


+ t]K 


= - 


x!-mn = + 


V\xi-Yi\ 


To proceed, we need to bound E+ 


max{y^r(X), ^/r{Y)} 


from the above. 


Lemma 5.9. Assume I/cq < a/b < cq for some positive constant cq. For any 0 < 6* < 1, there 
exists a d* = d*{9*,co) and a X = X{6*) G (0,1) such that for all |0| > 0*, d > d*, t > t*{6,d,a), 
and i G du, 


E+ 


1 - exi 
1 + 9X1 


<A, 


and the same is also true for Yl. 

Proof. By assumption ^ ^ < cq, it follows that |0| < and < rj < We prove the 

claim for Xl; the claim for Y^ follows similarly. Fix e = e(0*,co) < 1 to be determined later. It 
follows from Lemma 5.3 and the Markov inequality that for all t > t*{9, d, a) 


E{l-X* >e|a, 


+}< 


1-E+ [Xj] 
e 


2Cr) 2e-^^ 

-gYd+—r- 


where C is a universal constant. Hence, there exists a d* = d*{9*, cq) such that for all d > d* and 
0 > 0*, E |l — X* > elcr^ = +} < e, i.e., E {X* > 1 — elcr^ = +} > 1 — e. Since the distribution of 
X* conditional on au is the same as the distribution of X* conditional on fjj for i G du, it follows 
that E {X* > 1 — e|cJi = +} > 1 — e. By symmetry, the distribution of X* conditional on ctj = — is 
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the same as the distribution of —X- conditional on cjj = +, and thus P {X* < e — l\(Ti = —} > 1 — e. 
Hence, 

P {^i ^ 1 ~ ^lo'u = +} ^ P {^i ^ 1 ~ CTj = +|(Tu = +} 

= P {at = +|(Tn = +} P {xj > 1 — e|(Tj = +} 

> (1 — ri){l — e) > 1 — rj — e. 

Similarly, 

P {Xj <e—l|(Tu = +} >P| Xj < e — l,crj = —\(Tu = +} 

= P{c7j = -\au = +}P{X- < e - l|cjj = -} 

> 77(1 — e) > rj — e. 


Let f{x) = Y jqigf- Then / is non-increasing in x if 0 > 0 and non-decreasing if 0 < 0. It follows 
that 


E+ [f{Xt)] < /(I - e)(l - r/ - e) + f{-l){7j + e), 9 > 0, 

E+ [f{Xj)] < /(1)(1 -rj + e) + f{e-l){rj- e), 9 < 0. 

Notice that 

/(I - e) </(I)+ e sup |/(x)| < ^ 

/(e- 1 ) </(-I)+ e sup \f'{x)\ < A ^ 

xe[-i,£-i] V h 4 


where the last inequality holds because for x G [— 1 , 1 ] 

9^\x\ 


\f{x)\ = 


< 


(1 + 0X)3/2(1-0X)V2 - (1-|0|)2 


< 


(co -|- 1 )^ 


Hence, 


/(I - e)(l -rj-€) + f{-l){r] + e) < /( 1)(1 - rj) + fi-l)r] + ^ ^ e/(-l) 


= + (^^± 11 !! + ef{-l) 


where the last inequality holds because rj = 4^^ | 6 <| > 9*, and /(—I) < ^/<^■ Similarly, 


/( 1)(1 - 7j + e) + f{e - l){rj - e) < /( 1)(1 - tj + e) + /(-l)r? 


(co -h l)^e 


= 2 \/r/(l - rj) + ^ -b e/(l) 


In conclusion, for both of case 0 > 0 and case 0 < 0, we have shown that 

4 “ 

Therefore, there exists an e* = e*{9*,co) such that E+ [f{Xj)] < A for some A = A(0*) G (0,1). □ 


E+ [fiXt)] < Vl - {9*y + 
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Finally, we finish the proof of Proposition 5.8. It follows from Lemma 5.9 that 


\/ri^) + VrW) 


< 2A 


D-l 


E"*" max <E+ 

Combing the last displayed equation with (27) yields 

< 4(co + l)^a{l-a)D\^-^¥+ -y„*| 


E+ 




Thus, 


E+ 


- yf+i < 4(co + l)v'a(l-a)E [DX^-^] E+ - Y^\ 

By Cauchy-Schwarz inequality and D ~ Pois(d), 

E [L>A^] < VE [D2]y^E[A2^] = 

Combing the last two displayed equation, we get that 

E+ ^/|X*+^-y^+^ < 4(co + - a)E+ 


Since A = A(0*) G (0,1), there exists a d*{6*,co) such that for all d > d*, the desired (26) holds 
and hence the proposition follows. □ 


6 Optimality of local BP when a is small 


Theorem 6.1. There exists a constant 0 < a* < 1/2 depending only on cq such that if a < a*, 
then 


lim limsupE [|X* — Y^\] = 0. 

t ^OO ^ 

Proof. The proof is similar to the proof of Theorem 5.1 and is divided into three steps. Thus we 
only provide proof sketches below. 

We first show that for all t > 0, 

min {E+ [Xi] , E+ [T/] } > 1 - 4a. (28) 

In particular, given access to TQxt and r^t, the optimal estimator of is sgn(X*) whose success 
probability is (1 + E [|W^|])/2. For the estimator r^, its success probability is 1 — a. It follows 
that E [|W^|] > 1 — 2a. In view of Lemma 5.2 , 'EX[Xff\ > 2E [|W^|] — 1 > 1 — 4a. By the same 
argument, E+[y/] > 1 — 4a. 

We next focus on the case \9\ < 9*, where 9* is a small constant to be specified later. To prove 
the theorem in this case, it is sufficient to show that 

E+ [(W^+i - y/+i)2] < C^ail - a)E+ [(X* - Y^f] (29) 


for some explicit universal constant C. In the proof of Proposition 5.4, we have shown that 


E+ [(X*+^ - y/+^)^] < 128Va(l - a)E+ 


V2=l 2=1 
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and 


E+ 


n^.-ns. 


v.i=l 


2 = 1 


du 



+ Dm^-^E+ [{A - Bf] . 


Following the proof of Lemma 5.5, one can check that there exists a snch that if |0| < and 
a < 1/32, then m, < 1 — ^ < In view of Lemma 5.6 and Lemma 5.7, there exist 9^ and @3 

such that if 16*1 <9* = min{0j[‘, @ 2 ) ^ 3 }; then 


E+ 


n-4i-ns. 


\i=l 


2=1 


du 


<i7 


^2^4e-e2(D-2)/4 ^ ^^2g-02(D-l)/4A 


<c' E+ [(X*-y/) 2 ] 



where C is a universal constant and the last inequality holds because D9^e~^^^ and 

are bounded by a universal constant from above. Combing the last three displayed equations yields 

the desired (29). 

Finally, we consider the case |0| > 9*. In this case, to prove the theorem, it suffices to show 
that 


E+ 



^ u 


< C'{co)^a{l-a)E+ [^1^*->11 


(30) 


for some explicit constant C depending only on cq. In the proof of Proposition 5.8, we have shown 
that (27) holds, which gives 


E+ 


- Y^+^\du 


t+ii 


< 2(co + lUa{l-a)D E+ U\Xi-Y^\ 


E+ 


max 




where {r{X),r{Y)) has the same distribution as {ri{X)^ri{Y)) for i G du, and ri{x) = Wj^Qu j^i 
Following the proof of Lemma 5.9, one can verify that there exists an q*{9* ,cq) and a \{9*) G (0,1) 
such that if a < a*, then for all i G du, 


max 


E+ 


A-9X1 
1 + 


,E+ 


A-9YI 
1 + 9Y! 


< A. 


It follows that 


E+ 


max/VK^, < E+ 


< 2A 


D-l 


Combing the last three displayed eqnations yields 


E+ 


< 4(co + iWa{i-a)D\^-^E+ -y/| 


which implies the desired (30) holds, because D\^ ^ is bounded by a universal constant from 
above. □ 
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7 Density evolution in the large degree regime 

In this section, we consider the regime (3) and further assume that as n —)> oo, 

a — b 

a-^oo /i, 

where /r is a fixed constant. For t > 1, define 

(31) 

e&du 

^1= + (32) 

e&du 

where = oo if = + and = —oo if = 0 for all u. Then = <I>(j + hu 

and + hu for all t > 0. Notice that subtrees {T^}££du are independent and identically 

distributed conditional on Tu- Thus ({'f'£~^}te9n) are independent and identically dis¬ 

tributed conditional on r^. As a consequence, when the expected degree of u tends to infinity, 
due to the central limit theorem, we expect that the distribution of ^u ('^m) conditional on Tu is 
approximately Gaussian. 

Let (■^+) denote a random variable that has the same distribution as ('f't) conditional 
on Tu = +, and {Z-) denote a random variable that has the same distribution as <!>(, ('I'(j) 

conditional on Tu = —. We are going to prove that the distributions of IT| (.^+) and VFl {Z^_) are 

asymptotically Gaussian. The following lemma provides expressions of the mean and variance of 
and Zi. 

Lemma 7.1. For all t > 0, 

E [Z4+^] = ±^E [tanh(Z^ + U)] + 0{a-^/^). (33) 

var (Z4+^) = ^E [tanh(Z^ + U)] + 0{a-^/‘^). (34) 


Proof. By symmetry, the distribution of T^^ conditional on = — is the same as the distribution 
of —conditional on Tu = +. Thus, E = —E [Zi"^^] and var (Z^jT*"^) = var (Zi"^^). Hence, 

it suffices to prove the claims for By the definition of and the change of measure, we have 

that 

^[g{K)\Tu = -l] =E 


ff(rL) 


-2r‘ 


'Ll — +1 


where g is any measurable function such that the expectations above are well-defined. Recall that 
= hu + '^u- Hence, the distribution of conditional on = + is the same as the distribution 
of U + Z(j_; the distribution of conditional on = — is the same as the distribution of —U + Z(_. 
It follows that 


E [g (Zi 


U)]=E\g {Zl + U) e-2(^++^) 


(35) 
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Define'(/'(x) = log(l + x) — X + x^/2. It follows from the Taylor expansion that |V'(x)| < |x|^. Then 


F{x) = 2 log 


q2x _|_ g2/3 j ' ' 2 


= -/5 + X log 1 + 


= -/? + - log 
e^^ - 1 


1 , /e2^+2/5 + l\ 

*-2/3 + 1 ] 


— “/5 + 


2 

e^^ - 1 


1 + e-2(*-/3) 


/(a:^) - 


(e^/^-l)' 


1 


f\x) + ((e"^^ - l)/(x 


2 ^ V / 4 . V / 2 

where /(x) = - 2 ( 3 ;-/ 3 ) • Since and |/(x)| < 1, it follows that 


F{x) = -/3 + 


3^^ — 1 


fix) - 




f(x) + 0 (|e^^-l| 


(36) 


Therefore, 

— 1 


iGdu 


E 

ladu . 


-/? + 


('p4/3_it2 

+ h,) - l_^/2(xl,3 +h,)+0 (|e"^ - 1|3 


By conditioning the label of vertex u is —, it follows that 

,t+ii + - 1 


E ^ 


{lM[f{ZX + U)] +aE [f{Zl-U)]) 


- {b^ [fiZi + U)] + aE [f{Zf - [/))]) + O (b\e^^ - 1|' 

In view of (35), we have that 

6E [f{ZX + U)] + aE [f{Zt - U)] = bE f/(Z^ + U){1 + ^ 


(37) 


bE [f{Zl + U)] + aE [fiZf - U)] = bE \p{ZX + U){1 + e-2(^;+t3-/3))l = 5 ^ [/(Z^ + U)] 


(38) 


Hence, 


Notice that 


E [Z(+i] = -/3^ + [f{Z\. + U)]+0 {b\e^^ - Ip 

a — b\^' 


n P-, a — b\ a — b (a — b)‘ „ 

-/3 = --log 1 + I =-^ ^ + O 


2 ° V 
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462 


63 


( 39 ) 


As a consequence, 


„a + b 6 (e^^ - 1) a? - b"^ (a- 6 )^(a + 6 ) - b"^ ^ 

^ 2 ^ A 46 862 ^46 


|a — 6 p 


{a — by 
46 


+ 0 
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where the last equality holds due to (a — b)/Vb —^ for a fixed constant fi. Moreover, 


and 


tie*'* - 1|3 = O =0(a-‘/2). 


Assembling the last five displayed equations gives that 


E [z!+i] = ^ - [f{ZX + U)] + 


Finally, notice that 


f(x) - 


1 


1 + e- 2 ^ 


-2x|g2/3 _ 21 


It follows that 


(1 _|_e-2(a:-/3))(l ^ e-2x'^ 


,2 ,,2 


<|e2/3_i|=0(a-i/2). 


E rz^+^1 = ^ - ^E - \ , 

L - J 4 2 [i + e-2(^++^)J 


+ 0 (a-V 2 ) 


= -^tanh(Z^ + [/) + C>(a-i/2)^ 


(40) 


Next we calculate var(Zi'^^). For Y = where L is Poisson distributed, and {A*} 

are i.i.d. with finite second moments, one can check that var(y) = E [L] E [A^] . Since = 

Y.££du^i^\ + it follows that 

var(Z!+i) = ^E [F\zI + 17)] + |e [F\zt - U)] , 

In view of (36) and the fact that e^^ — 1 = o(l), we have that 

fHx) =P^- (eV - l) /3/(i) + + o (|e« - 1|») , 

Thus, 


var(Zi+i) = ^ ^ [6E [/(A^ + 17)] + aE [f{Zt - 17)]] 

+ ^ ~ [5E [/2(A^ + [/)] + aE [f\zt -U)]]+0 (b\e^^ - 1| 

Applying (37) and (38), we get that 

, 90 + 6 (e^^ — l)/36 (e^^ — l)^ (2/3 + 1)6 r , ^ .n f , ar 

var(A!+i) = ^ ^[f{Z\ + U)]+0 (b\e^^ - . 
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In view of (39), we have that 




,+ 6 (e^^ — l)/35 (a — 6 )^(a + 6 ) (a — 6 )^(a + 6 ) 


862 
(g - 

46 


462 


+ 0 


62 


+o(4^) = 4+o(<.-n 


and that 




26 

(g - 

26 


26 




Moreover, we have shown that 6 |e^^ —1|^ = 0{a Assembling the last three displayed equations 
gives that 


var(Z!.+i) = -^ + [/(4 + [/)] + 0{a-^/‘^). 


Finally, in view of (40), we get that 


^ 4^2®^ [l + e-2(^++^) 


1 


+ 0 (o-^/ 2 ) 


= ^E [tanh(Z^ + U)] + 


□ 


The following lemma is useful for proving the distributions of Z(j_ and Zt_ are approximately 
Gaussian. 

Lemma 7.2. (Analog of Berry-Esseen inequality for Poisson sums [29, Theorem 3].) Let = 
+ • • • + Atv,,, where Xi : i > 1 are independent, identically distributed random variables with 
finite second moment, and E [|Aij|^] < p^, and for some n > 0, Ny is a Pois(z^) random variable 
independent of (Xi : i > 1). Then 


sup 

X 


where Cbe = 0.3041. 


Sy - z^E [Ai] 


¥{Z < x} 


CbeP^ 


Lemma 7.3. Suppose a G (0,1/2] is fixed. Let h{v) = E [tanh(u + yfvZ + U)], where Z AA(0,1) 

and U = ’j with probability 1 — a and U = —7 with probability a, where 7 = 2 ■ Define 

2 

(vt : t > 0) recursively by vo = 0 and vt+i = ^h{vt). For any fixed t >0, as n ^ 00 , 


sup 

X 



F{Z < x} 


0 (a-V2). 


(41) 


Define {wt '■ t > 1) recursively by wi = and wt+i 


Ml 

4 


h{wt). For any fixed t>l, as n ^ 00 , 


sup 

X 


Vm 



F{Z <x} 


0(a-i/2). 


(42) 
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Proof. We prove the lemma by induction over t. We first consider the base case. For Z*, the base 
case t = 0 trivially holds, because = 0 and vq = 0. For VF*, we need to check the base case 
t = 1. Recall that = oo if = + and = —oo if . Notice that F{oo) = (3 and 

F{—oo) = —/3. Hence, where Pois((i) is independent of {Xj}; {Xi} are i.i.d. 

such that conditional on Tu, Xi = Tu(3 with probability a/{a + b) and Xi = —TuI3 with probability 
h/{a + h). Consequently, E[Xi|ru] = I36tu, E [X^] = /3^ and E [|A"i|^] = |/?p. Thus, in view of 
Lemma 7.2, we get that 


sup 

X 


^ dpe 

~vw~ 



F{Z < x} 




(43) 


Since 


/3 



a-b _ {a - bf f {a-bf \ 
2b Ab"^ \ b^ )' 


it follows that djdO = /i^/4 + 0{a and d(3‘^ = //^/4 + 0(a Note that by definition, 

wi = For any x G M, define x'_i_ such that ^/d^x'^ ± dfdO = Xy/wf ± wi. Then 


I 



wl^d^e \ 


Hence, 


sup 


= sup 

X 


< sup 


VFj. =F wi 
\/wh 
J wj T dpe 

\ vw 

dpe 




< xj — E{Z < x} 

< x'| — E{Z < x} 
<x'>—E{Z< x'} + sup 


'{Z < x'} -F{Z < x} 


(a) 


X 


'{Z < x'} -P{Z < x} 


(44) 


where (a) holds due to (43); the last inequality holds because |x' —x 


< O ((x + l)d ^/^) and hence. 


sup 

X 


'{Z < x'} -P{Z < x} 


< sup 




X — X max • 


,-xV2 p-(T)V2l _ 


I = 0 (d-l/ 2 ). 


Therefore, (42) holds for t = 1. 

In view of (31) and (32), and satisfy the same recursion. Moreover, by definition, vt and 
wt also satisfy the same recursion. Thus, to finish the proof of the lemma, it suffices to show that: 
suppose (41) holds for a fixed t, then it also holds for f + 1. Also, by symmetry, Z^^ has the same 
distribution as —Z^_f~^, so it is enough to show (41) holds for . 

Notice that Z(_^^ = where Pois(d) is independent of {Ti}; {Yi} are i.i.d. such 

that Yi = F{Z^ + U) with probability b/{a + b) and Yi = F{Zf — U) with probability a/{a + b). 
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Thus, E = dE [Yi] and var = dE [Y,^] . In view of Lemma 7.2, we get that 


sup 

X 


Zi+^ - dE [Yi] 


< X } 


P {Z < x} 


( ^E[|Yi|3] \ 
\{dE [Yi2])3/2 j ■ 


It follows from Lemma 7.1 that 

,.2 

dE [Yi] = -^E [tanh(Z^ + U)] + 0{a-^/^). 
dE [Yi^] = ^E [tanh(Z^ + U)] + 0{a-^/^). 
Using the area rule of expectation, we have that 


( 45 ) 


E [tanh(Z^ + U)] 

= f tanh'(t)P {Z^ + 17 > t} dt 


0 



tanh'(t)E{Z^ + C/<t} 


= / tanh'(t)P {vt + \/^Z + U >t} dt 

Jo 

= E [tanh(ut + ^/vtZ + U)] + 0(o“^/^). 



tanh'(t)P {vt + y/vtZ + U <t} + 0{a 


where the second equality follows from the induction hypothesis and the fact that | tanh^(t)| < 1. 
Hence, dE [Yi] = —ut+i+ 0(a“^/^) and dE [Y^^] = vt+i-\-0{a Moreover, since F is monotone, 
it follows that |T’(x)| < max{|F(cx))|, |F(—oo)|} = /3 and thus dE [|Yip] < d/3^ = 0(a“^/^). As a 
consequence, in view of (45) and following the similar argument as (44), we get that (41) holds for 

zi+^ □ 


We are about to prove Theorem 2.4 based on Lemma 7.3. Before that, we need a lemma showing 
that h is monotone. 


Lemma 7.4. h{v) is continuous on [0, oo) and 0 < h'{v) < 1 for v G (0, +oo). 
Proof. By definition, 


h{v) = (1 — a)E 


tanh ( V + \fvZ H— log- 

2 a 


+ aE 


1 / l-ry 1 , 1 — a 

tanh V + y/vZ -log- 

2 a 


Since | tanh(x)| < 1, the continuity of h follows from the dominated convergence theorem. We next 
show h'{v) exists for v G (0,oo). Fix c G M and let g{v) = E [tanh(u +-^/uZ + c)]. Notice that 
tanh'(x + ^J~xZ + c) = (1 — tanh^(x + ^JxZ + c))(l + x“^/^Z/2) for x G (0, oo), and 

I (l — tanh^(x + \fxZ + c)) (1 + x“^'^^Z/2)| < 1 + x“^/^|Z|/2. 

Since |Z| is integrable, by the dominated convergence theorem, E [tanh'(x + ^JxZ + c)] exists and 
is continuous in x. Therefore, x —)• E [tanh^(x + \J~xZ + c)] is integrable over x G (0, oo). It follows 
that 


g{v) = E 


rv 

tanh(c) + / tanh'(x + \fxZ + c)dx 

Jo 


nv 

tanh(c) + / E [tanh'(x + '/xZ + c)] dx, 

Jo 
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where the second equality holds due to Fubini’s theorem. Hence, 
g'{v) = E (l — tanh^(u + \fvZ + c)) (1 + 
Using the integration by parts, we can get that 
E [(l — tanh^(u + \pvZ + c)) \poZ'\ 

1 9 

(1 — tanh^fu + X + c) , e~^ /^’^dx 


= —V 


[ (1 — tanh^(u + X + c) r^^=e dx 

J-oo Vv 27 ru J 


= —x(l — tanh^(u + x + c) 


^—x'^/2v 


\/2 


TTV 


H-oo 


+ v (1 — tanh^(u + X + c)^ _ =e~^^^^^dx 


V2 


TTV 


= —2uE [tanh(u + y/vZ + c)(l — tanh^(u + y/vZ + c))] . 

The last two displayed equations yield that 

g{v) = E [(l — tanh(v^Z + u + c)) (l — tanh^(v^Z + u + c))] . 

It follows that 

h\v) = E [(l — lwh{^/vZ + u + U)) (l — tanh^(\/uZ + u + U))] . 

Thus h!{v) > 0. Finally, we show h!{v) < 1. We need the following equality: For /c G N, 


E 


icm\^^{^/vZ + v + U) 


= E 


tanlT*^ {^/vZ + v + U) 


(46) 


which immediately implies that 


/i'(u) = E (l — tanh^(\/uZ + u + [/))" 


< 1 . 


To prove (46), we need to introduce the notation of symmetric random variables [43, 34]. A random 
variable X is said to be symmetric if it takes values in (—oo, + 00 ) and 


E[g{X)]=E[g{-X)e-^^], 


(47) 


for any real function g such that at least one of the expectation values exists. It is easy to check 
by definition that ^/vZ + v and U are symmetric. Moreover, one can check that a sum of two 
independent, symmetric random variables is symmetric. Thus, ^/vZ + v + U is symmetric. As 
shown in [34, Lemma 3], if X is symmetric, then E [tanh^^(A)] = E [tanh^*^“^(A)]. Specifically, 
by plugging g{x) = tanh^^(x) and g{x) = tanh^^“^(x) into (47), we have that 


E 


tanh^*^(A) 


E 


tanh2*^-i(A) 


= E 
= E 


tanh^*^(—A)e 


-2X 


= E 


tanh^*^(A)e 


-2X 


tanh 


2k-l, 


-X)e 


-2X 


= -E 


tanh2'=-i(A)e 


-2X 





J L 


J L 



It follows that 







E 

tanh2^(A) 

= -E 

2 

tanh^^(A)(l + e“^^) 

= -E 

2 

tanh2^-^(A)(I -e-2^) 

= E 

tanh^*^ ^(-^) 


□ 
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Vt-1 


Proof of Theorem 2.4- In view of Lemma 7.3, 


lim ] 

n^oo 


' {r^ > 0|r„ = -} = lim P {r^ < 0|t„ 


Hence, it follows from Lemma 3.7 that 



Im PG„(^Bp) 

n—>-oo 


lim = 1 — E 

71^00 


Q 


f vt + [/ y 

V y. ■ 


We prove that vt+i > vt for t > 0 by induction. Recall that uq = 0 < ui = (1 — 2a)^/2^/4 = 
/r^/i(uo)/4. Suppose ut+i > holds; we shall show the claim also holds for t + 1. In particular, 
since h is continuous on [0,oo) and differential on (0,oo), it follows from the mean value theorem 
that 


vt+2 - Vt+i = ^ (h(vt+i) - h{vt)) = ^h'{x), 


for some x G (uj, Uj+i). Lemma 7.4 implies that h'{x) > 0 for x G (0, oo), it follows that vt +2 > vt+i- 

Hence, vt is non-decreasing in t. Next we argue that vt < v for all t > 0 by induction, where v 

2 

is the smallest fixed point of u = \h{v). For the base case, uq = 0 < u. If u* < y_, then by 

2 2 

the monotonicity of h, vt+i = ^h{vt) < ^h{v) = v. Thus, limt_>.oo uj exists and lim^^oo = L- 
Therefore, 


lim lim PG„i^np) = Hm lim = 1 — E 

t^oo n^oo t^oo n^oo 


Q 


v + U 


Next, we prove the claim for Pq^- In view of Lemma 7.3, 

lim E {A^ > 0\tu = -} = lim E {A^ < 0|r„ = = (1 - a)Q (+ aQ (. 

n->-oo n^oo y y/Wf J \ y/Wt J 

Hence, it follows from Lemma 3.9 that 


lim sup Pq < lim pift = 1 — E 

n^oo n—^oo 

Recall that wi = /4 > wt- By the same argument of proving vt is non-decreasing, one can show 

that Wt is non-increasing in t. Also, by the same argument of proving vt is upper bounded by v, 

2 

one can show that wt is lower bounded by v, where where v is the largest fixed point of u = ^h{v). 
Thus, lim^^oo wt exists and limt_^oo wt = v. Therefore, 



lim lim sup pL < lim lim pn = 1 — E 

t^oo n^cx) ^ t^oo n^oo 

Finally, notice that 

Wt+i - Vt+I = {h{wt) - h{vt)) < y{wt- Vt), 

where the last inequality holds because 0 < h'{x) < 1. If \p\ < 2, then p^/4 < 1 — e for some 
e > 0. Hence, {wt+i — vt+i) < (1 — e){wt — vt). Since wi — vi = — a), it follows that 

limi^oo('u^t — Vt) = 0 and thus v = v. If instead |^| > C for some sufficiently large constant C or 
a < a* for some sufficiently small constant 0 < a* < 1/2, then it follows from Theorem 5.1 and 
Theorem 6.1 that limj^oo limn^ooP^t = liiiit->.oo lim„_,.oo qf^t- As a consequence, v = v. 

□ 
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